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A signal processing resource system with multiple sets of coefficients, channel 
context memories, and configuration control logic sets organized into signal 
processing personalities which are multiplexed in their use according to input data 
organization. Adaptable signal processing characteristics, processing suspension, 
processing resumption and seeding of signal processing context is provided. Control 
logic allows a data stream to be processed using multiple signal processing 
characteristics or "personalities" according to associations or groupings of 
coefficient, channel context, and control logic sets. 
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TITLE: Signal processing resource with sample-by- sample selective characteristics 



Abstract Paragraph : 

A signal processing resource system with multiple sets of coefficients, channel 
context memories, and configuration control logic sets organized into signal . 
processing personalities which are multiplexed in their use according to input data 
organization. Adaptable signal processing characteristics, processing suspension, 
processing resumption and seeding of signal processing context is provided. Control 
logic allows a data stream to be processed using multiple signal processing 
characteristics or "personalities" according to associations or groupings of 
coefficient, channel context, and control logic sets. 

Summary of Invention Paragraph : 

[0002] This invention relates to, but is not limited to, the fields of embedded 
signal processing resources. 

Summary of Invention Paragraph : 

[0006] There are many applications of image and signal processing which reguire 
more microprocessing bandwidth than is available in a single processor at any given 
time. As microprocessors are improved and their operating speeds increase, so too 
are the application demands continuing to meet or exceed the ability of a single 
processor. For example, there are certain size, weight and power requirements to be 
met by processor modules or cards which are deployed in military, medical and 
commercial end-use applications, such as a line replaceable unit ("LRU") for use in 
a signal processing system onboard a military aircraft . These requirements 
typically limit a module or card to a maximum number of microprocessors and support 
circuits which may be incorporated onto the module due to the power consumption and 
physical packaging dimensions of the available microprocessors and their support 
circuits (memories, power regulators, bus interfaces, etc.). 

Summary of Invention Paragraph : 

[0007] As such, a given module design or configuration with a given number of 
processors operating at a certain execution speed will determine the total 
bandwidth and processing capability of the module for parallel and distributed 
processing applications such as image or signal processing . Thus, as a matter of 
practicality, it is determined whether a particular application can be ported to a 
specific module based upon these parameters. Any applications which cannot be 
successfully be ported to the module, usually due to requiring a higher processing 
bandwidth level than available on the module, are implemented elsewhere such as on 
mini-super computers. 

Summary of Invention Paragraph : 

[0010] For many years, this led the military to design specialized multi-processor 
modules which were optimized for a particular application or class of applications, 
such as radar signal processing, infrared sensor image processing, or 
communications signal decoding. A module designed for one class of applications, 
such as a radar signal processing module, may not be suitable for use in another 
application, such as signal decoding, due to architecture optimizations for the one 
application which are detrimental to other applications. 
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Summary of Invention Paragraph : 

[0012] This has given rise to a new market within the military hardware suppliers 
industry, causing competition to develop and offer improved generalized multi- 
processor architectures which are capable of hosting a wide range of software 
applications. In order to develop an effective general hardware architecture for a 
multi-processor board for multiple applications, one first examines the common 
needs or nature of the array of applications. Most of these types of applications 
work on two-dimensional data. For example, in one application, the source data may 
represent a 2-D radar image, and in another application, it may represent 2-D 
magnetic resonance imaging. Thus, it is common to break the data set into portions 
for processing by each microprocessor. Take an image' which is represented by an 
array of data consisting of 128 rows and 128 columns of samples. When a feature 
recognition application is ported to a quad processor module, each processor may be 
first assigned to process 32 rows of data, and then to process 32 columns of data. 
In signal processing parlance this is known as "corner turning". Corner turning is 
a characteristic of many algorithms and applications, and therefore is a common 
issue to be addressed in the interprocessor communications and memory arrangements 
for multi-processor boards and modules. 

Summary of Invention Paragraph : 

[0026] The related patent applications establish that our new multiprocessor 
architecture for distributed and parallel processing of data which provides optimal 
data transfer performance between processors and their local memories, from 
processor to processor, and from processors to module inputs and outputs, satisfies 
many needs in the art. Our new arrangement or architecture provides maximum 
performance when accessing local memory as well as nominal performance across other 
data transfer paths. Further, the related applications establish that our 
architecture is useful for realization with any high speed microprocessor family or 
combination of microprocessor models, including those microprocessors which are 
commonly used for control or signal processing applications and which exhibit I/O 
data transfer constraints relative to processing bandwidth. Our systems and methods 
described in the related patent applications addressed these needs, and are 
summarized in the following paragraphs. 

Summary of Invention Paragraph : 

[0028] In order to maximize the capabilities of our system, it was desirable to 
extend the functionality of the multiprocessor array to utilize the programmable 
logic arrays to actually perform some level of processing, and especially signal 
processing, on the data stored in the processor memories and the data which flows 
through the logic array. 

Summary of Invention Paragraph : 

[0029] Programmable logic device suppliers such as Xilinx have promoted use of 
their devices to perfor m signal processing functions in hardware rather than using 
the traditional software or microprocessor-based firmware solutions. Thus, the 
combination of the location of the programmable logic in the topology of our system 
disclosed in the related patent applications and the availability of signal 
processing "macros" and designs for programmable logic produced an opportunity to 
embe d signal processing in the new multiprocessor topology, thereby increasing the 
density of functionality and capability of the new architecture. 

Summary of Invention Paragraph : 

[0030] Additionally, we have also added a capability to our systems, methods, and 
architectures which allow these embedded signal processing functions to provide a 
selectable set of processing characteristics which are activated on a sample-by- 
sample basis, thereby enabling a multiplexed use of the "hardware" or internal FGPA 
resources over time. 

Summary of Invention Paragraph : 
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[0031] A system and method for providing sample-by-sample selectable 
characteristics of embedded signal processing resources useful in cooperation with 
a processor system such as, for example, a quad-processor arrangement having six 
interprocessor communications paths, one direct communication path between each of 
the two possible pairs of processors, with signal processing functions embedded in 
the communications paths as disclosed in the related patent applications. 

Brief Description of Drawings Paragraph : 

[0033] FIG. 2 provides additional detail of an internal architecture of the field 
programmable gate array for a processing node of the architecture as shown in FIG. 
1. 

Brief Description of Drawings Paragraph : 

[0034] FIG. 3 shows a signal processing framework contained within the field 
programmable gate array of FIG. 2. 

Brief Description of Drawings Paragraph : 

[0042] FIG. 11 depicts an alternate personality multiplexing scheme including a 
down sampling operation and parallel signal processing functions. 

Detail Description Paragraph : 

[0045] In one possible embodiment, our architecture is realized using four Motorola 
PowerPC [.TM.] G4 processors in the data transfer path topology as disclosed in the 
related patent application. However, it will be recognized by those skilled in the 
art that the architecture and arrangement of our system may be realized using a 
variety of high speed microprocessor families or combinations of microprocessor 
models, including but not limited to those which are commonly used for control or 
signal processing applications and those which exhibit I/O data transfer 
constraints relative to processing bandwidth. 

Detail Description Paragraph : 

[0046] The field programmable logic of one possible embodiment which is responsible 
for data path functions is extended to include a signal processing framework within 
the data path. As such, this programmable logic can be configured and used as a 
signal processing resource in conjunction or cooperation with the software 
capabilities of the microprocessors. 

Detail Description Paragraph : 

[0049] Turning to FIG. 1, the module architecture according to the preferred 
embodiment provides four processor nodes (11, 12, 13, 14), each node containing a 
member of the Motorola PowerPC [.TM.] family microprocessors and associated support 
circuitry. Each of the processors is interfaced to an external level 2 (L2) cache 
memory, as well as a programmed field programmable gate array (FPGA) device (17) . 

Detail Description Paragraph : 

[0062] As the interprocessor or node-to-node communications path interconnects are 
implemented by buffering and control logic contained in the FGPA programs, and as 
the this particular embodiment utilizes a "hot programmable" FPGA such as the 
Xilinx XCV 1600-8-FG 1156 [.TM.], the quad processor module can be reconfigured at 
two critical times: 

Detail Description Paragraph : 

[0071] The communication paths between the processor nodes are defined by the 
programmed FPGA devices (17) in this exemplary embodiment. Each FPGA device 
provides full 64-bit data and 32-bit address connections to the two memory banks 
local to it, in the preferred embodiment. The three paths from local processor to 
non-local memory (e.g. other processor nodes 1 local memories) are also 32-bits 
wide, and are write only, optimized for addressing the corner-turn processing 
function in two-dimensional signal processing . Alternate embodiments, of course, 
may use other types of logic such as ASICs or co-processors, and may employ various 
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data and address bus widths. 
Detail Description Paragraph : 

[0082] Signal Processing Functions Configurably Embedded Communications Paths 
Detail Description Paragraph : 

[0083] In this exemplary embodiment, the FPGA (17) is configured to include the 
signal processing node (25) as shown in FIG. 2. The FPGA (17) is configured to have 
one or two PCI bus interfaces (21a, 21b), a direct memory access ("DMA") interface 

(22a, 22b, 22c) to each of the other processing nodes of the module, as well as 
internal bus selectors (26a, 26b) to the memory banks (16) . 

Detail Description Paragraph : 

[008 6] With this addition of functionality to the FPGAs, our Matched Heterogeneous 
Array Topology Signal Processing System ("MHAT" ) is realized. One or more signal 
processing functions may be loaded into the DSP node (25) so as to allow data to be 
processed prior to storing in the memory banks (16) . MHAT provides a marriage of 
the microprocessors and the FPGAs to facilitate simultaneous data processing and 
data reorganization, which reduces real-time operating system interrupt overhead 
processing and complexity. 

Detail Description Paragraph : 

[0087] Turning to FIG. 3, the internal architecture of a DSP node (25) which 
provides a framework for hosting a variety of signal processing functions (35) is 
shown. The signal processing functions may include operations such as FIR filters, 
digital receivers, digital down converters, fast Fourier transforms ("FFT"), QR 
decomposition, time-delay beamforming, as well as other functions. 

Detail Description Paragraph : 

[0088] To input data ports (38a, 38b) are provided, each of which receive data into 
an asynchronous first-in first-out ("FIFO") (31a, 31b). The data may then be 
multiplexed, formatted, and masked (33a), and optionally digitally down converted 

(33b) prior to being received into the signal processing logic (35) . 

Detail Description Paragraph : 

[0089] After being processed by the signal processing logic (35), the data may 
again be formatted, converted from fixed point representation to floating point 
representation (36), and then it is loaded into an output asynchronous FIFO for 
eventual output to the output data port (39) . 

Detail Description Paragraph : 

[0090] FIG. 4 provides more details of an FIR building block (40) which may be 
configured into the portion of the signal processing logic (35) . Data which is 
received (48) from the previous building block or from the signal processing logic 
input formatters and digital down converters is received into the data memory (41) . 
The data may then be multiplied (45) by coefficients stored in coefficient memory 
(43), summed (46) with previous summation results or (44) summation results from 
other building blocks (401, 402), the results of which operations is stored in 
channel memory (49) . 

Detail Description Paragraph : 

[0093] As such, multiple building blocks may be cascaded by interconnecting data 
inputs, data outputs, summation inputs, in summation outputs. Further, each 
building block may be customized and configured to have specific properties or 
characteristics as defined by the coefficients in control settings stored into the 
control memory (42) and coefficient memory (43), which is loadable by the 
microprocessor. In FIG. 5, a "sum out" connection arrangement (50) of such FIR 
filter building blocks is shown. This may include a single real or complex FIR 
filter (51), multiple filters (52), and digital down converters (53), as well as 
other functions. With this arrangement, a series of signal processing operations 
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may be implemented which allows data to be processed in transit from one processing 
node's local memory to the local memory banks of another processor. 

Detail Description Paragraph : 

[0094] In FIG. 6, a "data out" or cascade connection arrangement (50*) of signal 
processing building blocks for a digital receiver is shown. In this example, a 
demodulator (51) is followed by image rejection (52) functions, which are in turn 
followed by bandwidth control functions (53), in which are followed by the complex 
equalizer (54) . Similar to the discussion of FIG. 5, the embodiment or 
implementation of an FIR filter is not restricted to the particular disclosure 
here, nor is the type of signal processing function restricted only to these 
particular blocks. Further, the topology of interconnected signal processing 
functions may take a variety of forms, combining series and parallel 
interconnections as needed for specific applications. 

Detail Description Paragraph : 

[0096] Turning to FIG. 7, the n RT__STAP" benchmark process used to measure the 
performance and functional density of COTS processing modules is shown. This 
particular process represents a task to find targets on a ground surface in a 
signal set acquired from an airborne platform such as an airplane . The benchmark 
process is designed to utilize various portions of processor modules (e.g. DMA, 
memory busses, interrupts, etc.), such that it represents a broad measurement of 
processing module's capabilities. It also includes a mix of types of processes, 
including simple sample-by-sample calculations in in-phase and quadrature ("I/Q) 
data (73), followed by pulse compression (74) correlation process, during which a 
corner turning process must be performed to transpose a matrix (71), followed by 
some Doppler processing (75), followed by a "QRD" function (76), which is an 
equations solver for performing adaptive processing. These processes are each well 
known in the art, and are commonly used within various mission profiles often 
performed by such multiprocessor modules. 

Detail Description Paragraph : 

[0098] This mission profile (78) may be met using 8 quad processor modules (80) of 
the type available on the market and previously described, five of which are 
dedicated to the initial processing functions, and three of which are dedicated to 
the latter processing functions, as shown in FIG. 8. 

Detail Description Paragraph : 

[0099] However, by enhancing the QuadPPC board to include the signal processing 
functionality embedded into the interprocessor communication paths according to the 
present invention, this entire mission profile may be realized using only 3 boards 
or modules (81) . This results in decreased failure rates by required less physical 
hardware, decreased cost, and reduced system characteristics (e.g. weight, 
dimensions, power, etc.). For airborne platforms, reductions in system 
characteristics such as weight, size, and power translates to greater mission 
range, increase d aircraft performance and maneuverability. 

Detail Description Paragraph : 

[0100] Multiple Personalit y Signal Processing Resource 
Detail Description Paragraph : 

[0101] Turning to FIG. 9, another embodiment of the example FIR building block as 
shown in FIG. 4 is shown. However, with additional allocation of coefficient (i.e. 
parameter) memory (43), channel memory (49), and control logic (42) (or subdivision 
of existing memories), a number of coefficient memory sets (43 f ), channel memory 
sets (49 f ), and even control logic sets (42* ) are provided. As such, prior to 
processing a given sample, a particular channel memory set may be selected along 
with a set of coefficients in a corresponding coefficient memory. For example, if 
the basic configuration of the signal processing block is that of an anti-aliasing 

(e.g. Nyquist) low pass filter ("LPF"), one coefficient set can be set for a 
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rolloff at 2 MHz, while a second coefficient set can be set for a rolloff at 8 MHz. 
Then, two different channels of data can be processed through the same physical 
FPGA hardware by selecting the appropriate coefficient set according to which 
filter characteristic is to be applied to the data samples currently being 
processed through the resource. By adding additional control logic, such as even 
number samples are for the 2 MHz LPF and odd number samples are for the 8 MHz LPF. 
In this manner, the two channels of data can be interleaved (e.g. multiplexed into 
a stream of a-b-a-b-a-b, etc.), and the filter resource will process each sample 
accordingly. Other schemes of data organization and selection of coefficient sets 
can be implemented, as well, such as block processing (e.g. 1000 samples of one 
filter followed by 800 samples of another, etc.). 

Detail Description Paragraph : 

[0102] During an operation such as this wherein the coefficients for the signal 
processing resource are selected according to a control scheme, the channel memory 
sets are also correspondingly selected. Each channel memory set provides a unique 
storage or buffer of intermediate values from the previous filter iteration for the 
previous sample (or sample block) , and as such, remembers the "context" of the 
filter from the last use of the filter with the corresponding coefficients. 
Contrary to traditional software practice wherein such context would have to be 
restored typically by many stack, memory, or pointer operations, our signal 
processing resource can select these coefficient sets, channel memory sets, and 
control sets in a single operation. 

Detail Description Paragraph : 

[0109] In another variation embodiment, additional logic (90) to adapt coefficients 
stored in coefficient memory may be employed to realize adaptive signal processing 
functions, such as adaptive filters and iterative convergent numerical operations. 
This logic, too, may be provided in sets with the personalities of the signal 
processing resource, with the adaption logic sets being correspondingly selected 
and used for each personality. 

Detail Description Paragraph : 

[0110] As data which is received at the input of the signal processing block or 
blocks can be selectively processed by different signal processing personalities in 
the same hardware resource (e.g. different combinations of coefficient sets, 
control sets, and channel memory sets) on a sample-by-sample basis in our new 
system, quite a bit of flexibility in the use of the signal processing resources is 
afforded. 

Detail Description Paragraph : 

[0113] To realize such a configuration or operation of the signal processing source 
(151), the control logic must be configured to select one of 4 different 
coefficient sets and channel memory sets every input sample, synchronized and 
coordinated with a selected sample present or buffered from the input stream (153) . 
For example, channel A data would be processed using control, coefficient and 
channel memory for Filter A, with the control logic for Filter A (155) selecting 
(153) Filter A f s channel memory and coefficient memory only when channel A samples 
<A.sub.l>, <A.sub.l>, . . . <A.sub.n> (54) are being processed. Likewise, channel B 
data would be processed using control, coefficient memory and channel memory for 
Filter B (157) when channel B samples <B.sub.l>, <B.sub.2>, . . . <B.sub.n> (156) 
are being processed, and similarly for Filter C (159) for channel C data (158) and 
Filter D (1501) for channel D (1500) data. 

Detail Description Paragraph : 

[0114] This illustrates the ability of the signal processing logic (151) to 
multiplex over time the usage or application of coefficients, channel memories, and 
control logic for individual samples, thus realizing a time-multiplexed personality 
of the signal processing resource. As will be evident at this point to those 
skilled in the art, other multiplexing schemes could be accommodated with different 
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control logic, including but not limited to framed or packeted data streams (e.g. a 
block of data from one channel followed by a block of data from another channel, 
etc . ) . 

Detail Description Paragraph : 

[0115] Additionally, successive data samples from the same data channel may be 
processed by different signal processing resource personalities to realize an 
undersampling function simultaneous with "parallel" signal processing of the 
different personalities. For example, the multiplexed personality signal processing 
system (160) of FIG. 11 may be realized by a variation of embodiment of the control 
logic (153') in which alternating samples from the same channel data (152 1 ) are 
processed by two alternating filter (or other signal processing ) functions A and B 
(155, 157). For this example, let's assume that the original data sampling rate of 
channel A is 128 million samples per second (128 Msa/sec) , but neither filter 
requires better than sample data rates of 32 Msa/sec to perform their functions 
with the desired accuracy. As such, the input data stream can be downsampled and 
"shared" between the two filters by operating Filter A on "odd" numbered samples 
(154 1 ), and Filter B on "even" numbered samples (156'). FIG. 11 shows that, in this 
example, Filter A would the operate on samples <A.sub.l>, <A.sub.3>, <A.sub.5>, . . 
. , and Filter B would operate on samples <A.sub.2>, <A.sub.4>, <A.sub.6>, etc. 
This effectively downsamples the input streams to each filter to 64 Msa/sec, and 
processes both downsampled streams (154 1 , 156') in parallel over time. 

Detail Description Paragraph : 

[0116] The signal processing functions, of course, do not have to be limited to 
filters as in the example, nor does the personality multiplexing schemes employed 
have to be limited to just a few signal processing personalities, highly 
patternistic or repetitious data input streams, etc., as the control logic may 
defined to implement a wider variety and much more complex multiplexing schemes 
which combine elements of the foregoing illustrations. For example, 4 signal 
processing personalities (171) could be configured to operate in series on one 
channel's data, while 3 other personalities (172, 173, 174) could be configured to 
process in parallel some portion of the input data stream, as illustrated in FIG. 
12. In this personality multiplexing configuration (151"), the control logic (153") 
is also configured to process a portion of the input data (175) using 2 different 
processing functions E (172) and F (173) . In other words, the same data values 
input to processing function E is also input to processing F. This type of 
"copying" of data to multiple processing function personalities can be expanded in 
alternate embodiments, taking on more of a "broadcast" nature within the signal 
processing resource for even more complex personality multiplexing schemes. 

Detail Description Paragraph : 

[0118] In an embodiment option for the parameter port (34") as shown in. FIG. 13, 
the port is adapted to load or deposit values (e.g. write by a microprocessor) into 
channel memory, as well. This provides several new capabilities to the multiplexing 
.of personalities and functionalities of the signal processing resource. First, it 
allows the channel memory to be pre-loaded with a set of data values, such as 
zeroes, for initialization. 

Detail Description Paragraph : 

[0120] Resumption of processing can be on the same physical resource hardware, or 
can be on a different resource hardware. For example, the processor could perform a 
certain amount of processing, suspend processing and save the channel memory 
contents, followed by transfering this information to a second processor where the 
channel memory could be loaded to resume processing on a different signal 
processing hardware resource. This allows division of processing functionality 
between different processing nodes, but preserves the ability to use the FPGA-based 
signal processing resources as previously described, albeit distributed among 
multiple FPGA's over time. 
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Detail Description Paragraph : 

[0121] Additionally, the parameter port may be adapted to output contents of the 
coefficient memory, which may be especially useful for saving the context of an 
adaptive signal processing function in which the coefficients have been modified by 
the signal processing function after original loading of the coefficients by the 
microprocessor. This allows adaptive functions to be suspended and resumed (either 
on the same physical resource or another resource) as previously described related 
to the ability to output and save the contents of the channel memory. 



I. A configurable signal processing computation resource system comprising: a data 
input for receiving a plurality of data samples, and having a data output; a 
plurality of selectable channel memory sets, each, channel memory set storing a set 
of computation values for said computation resource; a plurality of selectable 
coefficient memory sets, each coefficient memory set storing a set of coefficients 
and parameters for said computation resource; a parameter port for providing values 
into said coefficient memory sets; and a control portion configured to select 
coefficient memory sets and channel memory sets coordinated with processing of 
input data samples such that signal processing function personalities are realized 
and applied to data samples according to a predetermined scheme. 

6. The system as set forth in claim 1 wherein said control portion is configured to 
select said signal processing function personalities according to a personality 
multiplexing scheme coordinated to an input data sample multiplexing scheme. 

10. The system as set forth in claim 1 wherein said control portion is adapted to 
multiplex signal processing function personalities to provide parallel processing 
functions . 

II. The system as set forth in claim 1 wherein said control portion is adapted to 
multiplex signal processing function personalities to provide series processing 
functions. 

12. The system as set forth in claim 1 wherein said control portion is adapted to 
multiplex signal processing function personalities to provide operation of a 
plurality of signal processing function personalities on equivalent input data 
sample values. 

13. The system as set forth in claim 1 wherein said configurable signal processing 
computation resource comprises a field programmable logic array. 

14. The system as set forth in claim 1 wherein said configurable signal processing 
computation resource comprises a programmable logic device. 

15. The system as set forth in claim 1 wherein said configurable signal processing 
computation resource comprises a programmable logic portion of a microprocessor. 



CLAIMS : 



Previous Doc 



Next Doc 



Go to Doc# 



http://westbrs:9000ftiny g ate.exe 5/21/05 



Record Display Form 



Page 1 of 2 



First Hit Previous Doc Next Doc Go to Doc# 




Lll: Entry 4 of 8 File: PGPB Jun 17, 2004 

PGPUB- DOCUMENT -NUMBER : 20040117519 
PGPUB- FI LING-TY PE : new 

DOCUMENT-IDENTIFIER: US 20040117519 Al 

TITLE: Autonomous signal processing resource for selective series processing of 
data in transit on communications paths in multi-processor arrangements 

PUBLICATION-DATE: June 17, 2004 

INVENTOR-INFORMATION : 

NAME CITY STATE COUNTRY RULE-47 

Smith, Winthrop W. Richardson TX US 

APPL-NO: 10/ 320078 [PALM] 
DATE FILED: December 16, 2002 

INT-CL: [07] GO 6 F 13/28 

US-CL- PUBLISHED: 710/022 
US-CL-CURRENT: 710 /22 

REPRESENTATIVE-FIGURES : 1 



ABSTRACT : 

A multi-processor arrangement having an interprocessor communication path between 
each of every possible pair of processors, in addition to I/O paths to and from the 
arrangement, having signal processing functions configurably embedded in series 
with the communication paths and/or the I/O paths. Each processor is provided with 
a local memory which can be accessed by the local processor as well as by the other 
processors via the communications paths. This allows for efficient data movement 
from one processor's local memory to another processor's local memory, such as 
commonly done during signal processing corner turning operations. Configurable 
signal processing logic may be configured to host one or more signal processing 
functions which allow data to be autonomously accessed from the processor local 
memories, processed, and re-deposited in a local memory. 

CROSS-REFERENCE TO RELATED APPLICATIONS 

Claiming Benefit Under 35 U.S.C. 120 

[0001] This application is related to U.S. patent application Ser. No. 09/850,939, 
filed on May 8, 2001, docket number TFT2001-001, and to U.S. patent application 
Ser. No. 10/198,021, filed on Jul. 18, 2002, docket number TFT2002-001, both by 
Winthrop W. Smith. 
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Abstract Paragraph : 

A multi-processor arrangement having an interprocessor communication path between 
each of every possible pair of processors, in addition to I/O paths to and from the 
arrangement, having signal processing functions configurably embedded in series 
with the communication paths and/or the I/O paths. Each processor is provided with 
a local memory which can be accessed by the local processor as well as by the other 
processors via the communications paths. This allows for efficient data movement 
from one processor's local memory to another processor's local memory, such as 
commonly done during signal processing corner turning operations. Configurable 
signal processing logic may be configured to host one or more signal processing 
functions which allow data to be autonomously accessed from the processor local 
memories, processed, and re-deposited in a local memory. 

Summary of Invention Paragraph : 

[0006] This invention relates to the arts of signal processing, multi-processor 
architectures, and programmable logic. 

Summary of Invention Paragraph : 

[0008] There are many applications of image and signal processing which require 
more microprocessing bandwidth than is available in a single processor at any given 
time. As microprocessors are improved and their operating speeds increase, so too 
are the application demands continuing to meet or exceed the ability of a single 
processor. For example, there are certain size, weight and power requirements to be 
met by processor modules or cards which are deployed in military, medical and 
commercial end-use applications, such as a line replaceable unit ("LRU") for use in 
a signal processing system onboard a military aircraft . These requirements 
typically limit a module or card to a maximum number of microprocessors and support 
circuits which may be incorporated onto the module due to the power consumption and 
physical packaging dimensions of the available microprocessors and their support 
circuits (memories, power regulators, bus interfaces, etc.). 

Summary of Invention Paragraph : 

[0009] As such, a given module design or configuration with a given number of 
processors operating at a certain execution speed will determine the total 
bandwidth and processing capability of the module for parallel and distributed 
processing applications such as image or signal processing . Thus, as a matter of 
practicality, it is determined whether a particular application can be ported to a 
specific module based upon these parameters. Any applications which cannot be 
successfully be ported to the module, usually due to requiring a higher processing 
bandwidth level than available on the module, are implemented elsewhere such as on 
mini-super computers. 

Summary of Invention Paragraph : 

[0012] For many years, this led the military to design specialized multi-processor 
modules which were optimized for a particular application or class of applications, 
such as radar signal processing, infrared sensor image processing, or 
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communications signal decoding. A module designed for one class of applications, 
such as a rada r signal processing module, may not be suitable for use in another 
application, such as signal decoding, due to architecture optimizations for the one 
application which are detrimental to other applications. 

Summary of Invention Paragraph : 

[0014] This has given rise to a new market within the military hardware suppliers 
industry, causing competition to develop and offer improved generalized multi- 
processor architectures which are capable of hosting a wide range of software 
applications. In order to develop an effective general hardware architecture for a 
multi-processor board for multiple applications, one first examines the common 
needs or nature of the array of applications. Most of these types of applications 
work on two-dimensional data. For example, in one application, the source data may 
represent a 2-D radar image, and in another application, it may represent 2-D 
magnetic resonance imaging. Thus, it is common to break the data set into portions 
for processing by each microprocessor. Take an image which is represented by an 
array of data consisting of 128 rows and 128 columns of samples. When a feature 
recognition application is ported to a quad processor module, each processor may be 
first assigned to process 32 rows of data, and then to process 32 columns of data. 
In signal processing parlance this is known as "corner turning". Comer turning is a 
characteristic of many algorithms and applications, and therefore is a common issue 
to be addressed in the interprocessor communications and memory arrangements for 
multi-processor boards and modules. 

Summary of Invention Paragraph : 

[0028] The related patent application established that there is a need in the art 
for a multiprocessor architecture for distributed and parallel processing of data 
which provides optimal data transfer performance between processors and their local 
memories, from processor to processor, and from processors to module inputs and 
outputs. In particular, there is a need in the art for this new arrangement to 
provide maximum performance when accessing local memory as well as nominal 
performance across other data transfer paths. Further, the related application 
established that there is a need in the art for this new architecture to be useful 
and advantageous for realization with any high speed microprocessor family or 
combination of microprocessor models, and especially those which are commonly used 
for control or signal processing applications and which exhibit I/O data transfer 
constraints relative to processing bandwidth. The invention described in the 
related patent application addressed these needs, and is summarized in the 
following paragraphs. 

Summary of Invention Paragraph : 

[0030] In order to maximize the capabilities of the related invention, it was 
desirable to extend the functionality of the multiprocessor array to utilize the 
programmable logic arrays to actually perform some level of processing, and 
especially signal processing, on the data stored in the processor memories and the 
data which flows through the logic array. 

Summary of Invention Paragraph : 

[0031] Programmable logic device suppliers such as Xilinx have promoted use of 
their devices to perfor m signal processing functions in hardware rather than using 
the traditional software or microprocessor-based firmware solutions. Thus, the 
combination of the location of the programmable logic in the topology of the 
invention disclosed in the related patent application and the availability of 
signal processing "macros" and designs for programmable logic produces an 
opportunity to embed signal processing in the new multiprocessor topology, thereby 
increasing the density of functionality and capability of the new architecture. 

Summary of Invention Paragraph : 

[0032] A quad-processor arrangement having six interprocessor communications paths, 
one direct communication path between each of the two possible pairs of processors, 
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with signal processing functions embedded in the communications paths is disclosed. 
The embedded signal processing functions may also be utilized to process data as it 
is being moved into or out of the quad-processor arrangement. 

Summary of Invention Paragraph : 

[0033] Each processor is provided with a local memory which can be accessed by the 
local processor as well as by the other processors via the communications paths, 
either by direct reading and writing operations by the processors or preferably via 
automatic memory-to-memory transfers using direct memory access ("DMA") engines. 
This allows for efficient data movement from one processor's local memory to 
another processor's local memory, such as commonly done during signal processing 
corner turning operations. 

Summary of Invention Paragraph : 

[0035] According to the present invention, the programmable logic is configured 
with an internal framework for hosting data processing functions, and especially 
digital signal processing, such that data traversing an interprocessor 
communication path or board bus can be processed in transit, whether the data is 
being moved from one local memory to another local memory using DMA., is being 
written or read by a DSP into a local memory, or is being moved into or out of the 
quad processor arrangement. 

Brief Description of Drawings Paragraph : 

[0038] FIG. 2 provides additional detail of the internal architecture of the field 
programmable gate array for a processing node of the architecture as shown in FIG. 
1. ' _ 

Brief Description of Drawings Paragraph : 

[0039] FIG. 3 shows the signal processing framework contained within the field 
programmable gate array of FIG. 2. 

Detail Description Paragraph : 

[0045] According to the preferred embodiment, the architecture of the invention is 
realized using four Motorola PowerPC [TM] G4 processors in the data transfer path 
topology as disclosed in the related patent application. However, it will be 
recognized by those skilled in the art that the architecture and arrangement of the 
invention is equally applicable and advantageous for realization with any high 
speed microprocessor family or combination of microprocessor models, and especially 
those which are commonly used for control or signal processing applications and 
which exhibit I/O data transfer constraints relative to processing bandwidth. The 
field programmable logic of the preferred embodiment which is responsible for data 
path functions is extended to include a signal processing framework within the data 
path, which can be used as a signal processing resource in conjunction with or 
cooperation with the software capabilities of the microprocessors. 

Detail Description Paragraph : 

[0048] Turning to FIG. 1, the module architecture according to the preferred 
embodiment provides four processor nodes (11, 12, 13, 14), each node containing a 
member of the Motorola PowerPC [TM] family microprocessors and associated support 
circuitry. Each of the processors is interfaced to an external level 2 (L2) cache 
memory, as well as a programmed field programmable gate array (FPGA) device (17) . 

Detail Description Paragraph : 

[0061] As the interprocessor or node-to-node communications path interconnects are 
implemented by buffering and control logic contained in the FGPA programs, and as 
the preferred embodiment utilizes a "hot programmable" FPGA such as the Xilinx 
XCV1600-8-FG1156 [TM] , the quad processor module can be reconfigured at two 
critical times: 

Detail Description Paragraph : 
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[0070] The communication paths between the processor nodes are defined by the 
programmed FPGA devices (17) in the preferred embodiment. Each FPGA device provides 
full 64-bit data and 32-bit address connections to the two memory banks local to 
it, in the preferred embodiment. The three paths from local processor to non-local 
memory (e.g. other pr.ocessor nodes 1 local memories) are also 32-bits wide, and are 
write only, optimized for addressing the corner-turn processing function in two- 
dimensional signal processing . Alternate embodiments, of course, may use other 
types of logic such as ASICs or co-processors, and may employ various data and 
address bus widths. 

Detail Description Paragraph : 

[0081] Signal Processing Functions Configurably Embedded Communications Paths 
Detail Description Paragraph : 

[0082] According to the present invention, the FPGA (17) is enhanced to include the 
signal processing node (25) as shown in FIG. 2. The FPGA (17) is configured to have 
one or two PCI bus interfaces (21a, 21b), a direct memory access ("DMA") interface 
(22a, 22b, 22c) to each of the other processing nodes of the module, as well as 
internal bus selectors (26a, 26b) to the memory banks (16) . 

Detail Description Paragraph : 

[0085] With this addition of functionality to the FPGAs, our Matched Heterogeneous 
Array Topology Signal Processing System ("MHAT") is realized. One or more signal 
processing functions may be loaded into the DSP node (25) so as to allow data to be 
processed prior to storing in the memory banks (16) . MHAT provides a marriage of 
the microprocessors and the FPGAs to facilitate simultaneous data processing and 
data reorganization, which reduces real-time operating system interrupt overhead 
processing and complexity. 

Detail Description Paragraph : 

[0086] Turning to FIG. 3, the internal architecture of a DSP node (25) which 
provides a framework for hosting a variety of signal processing functions (35) is 
shown. The signal processing functions may include operations such as FIR filters, 
digital receivers, digital down converters, fast Fourier transforms ("FFT"), QR 
decomposition, time-delay beamforming, as well as other functions. 

Detail Description Paragraph : 

[0087] Two input data ports (38a, 38b) are provided, each of which receive data 
into an asynchronous first-in first-out ("FIFO") (31a, 31b) . The data may then be 
multiplexed, formatted, and masked (33a), and optionally digitally down converted 
(33b) prior to being received into the signal processing logic (35) . 

Detail Description Paragraph : 

[0088] After being processed by the signal processing logic (35), the data may 
again be formatted, converted from fixed point representation to floating point 
representation (36), and then it is loaded into an output asynchronous FIFO for 
eventual output to the output data port (39) . 

Detail Description Paragraph : 

[0089] FIG. 4 provides more details of an FIR building block (40) which may be 
configured into the portion of the signal processing logic (35) . Data which is 
received (48) from the previous building block or from the signal processing logic 
input formatters and digital down converters is received into the data memory (41) . 
The data may then be multiplied (45) by coefficients stored in coefficient memory 

(43) , summed (46) with previous summation results or (44) summation results from 
other building blocks (401, 402), the results of which operations is stored in 
channel memory (49) . 

Detail Description Paragraph : 

[0092] As such, multiple building blocks may be cascaded by interconnecting data 
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inputs, data outputs, summation inputs, in summation outputs. Further, each 
building block may be customized and configured to have specific properties or 
characteristics as defined by the coefficients in control settings stored into the 
control memory (42) and coefficient memory (43), which is loadable by the 
microprocessor. In FIG. 5, a "sum out" connection arrangement (50) of such FIR 
filter building blocks is shown. This may include a single real or complex FIR 
filter (51), multiple filters (52), and digital down converters (53), as well as 
other functions. With this arrangement, a series of signal processing operations 
may be implemented which allows data to be processed in transit from one processing 
node's local memory to the local memory banks of another processor. 

Detail Description Paragraph : 

[0093] In FIG. 6, a "data out" or cascade connection arrangement (50 1 ) of signal 
processing building blocks for a digital receiver is shown. In this example, a 
demodulator (51) is followed by image rejection (52) functions, which are in turn 
followed by bandwidth control functions (53), in which are followed by the complex 
equalizer (54) . 

Detail Description Paragraph : 

[0095] Turning to FIG. 7, the "RT_STAP" benchmark process used to measure the 
performance and functional density of COTS processing modules is shown. This 
particular process represents a task to find targets on a ground surface in a 
signal set acquired from an airborne platform such as an airplane . The benchmark 
process is designed to utilize various portions of processor modules (e.g. DMA, 
memory busses, interrupts, etc.), such that it represents a broad measurement of 
processing module's capabilities. It also includes a mix of types of processes, 
including simple sample-by-sample calculations in in-phase and quadrature ("I/Q) 
data (73), followed by pulse compression (74) correlation process, during which a 
corner turning process must be performed to transpose a matrix (71) , followed by 
some Doppler processing (75), followed by a "QRD" function (76), which is an 
equations solver for performing adaptive processing. These processes are each well 
known in the art, and are commonly used within various mission profiles often 
performed by such multiprocessor modules. 

Detail Description Paragraph : 

[0097] This mission profile (78) may be met using 8 quad processor modules (80) of 
the type available on the market and previously described, five of which are 
dedicated to the initial processing functions, and three of which are dedicated to 
the latter processing functions, as shown in FIG. 8. 

Detail Description Paragraph : 

[0098] However, by enhancing the QuadPPC board to include the signal processing 
functionality embedded into the interprocessor communication paths according to the 
present invention, this entire mission profile may be realized using only 3 boards 
or modules (81) . This results in decreased failure rates by required less physical 
hardware, decreased cost, and reduced system characteristics (e.g. weight, 
dimensions, power, etc.). For airborne platforms, reductions in system 
characteristics such as weight, size, and power translates to greater mission 
range, increase d aircraft performance and maneuverability. 

CLAIMS : 

1. An autonomous configurable signal processing resource in a multi-processor 
system, said multi-processor system two or more processor nodes, each processor 
node having a processor, local memory and a communications bus interface, also 
having a plurality of point-to-point communication busses disposed between pairs of 
said communications bus interfaces such that data may be moved across said point- 
to-point busses between said local memories of said processor nodes, and further 
having one or more I/O busses disposed to communicate with all said processor nodes 
as well with sources and destinations outside said arrangement, said autonomous 
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configurable signal processing resource comprising: one or more configurable signal 
processing frameworks selectably disposed in series with one or more communication 
bus or I/O bus such that data traversing a bus may be processed by logic hosted by 
said processing frameworks, and a direct memory access ("DMA") interface to one or 
more of the other processing nodes for receiving data, and an internal bus selector 
for storing data and computational results in a local memory; a bi-directional bus 
selector associated with said local memory for selectively inputting and outputting 
data; and a DMA controller disposed as to communicate with said local memory bus 
selector and said signal processing framework internal bus selector such that said 
signal processing framework may autonomously receive and transmit data selectively 
to and from said local memory, said DMA controller providing retrieval of data from 
said local memory and presentation of said retrieved data to said signal processing 
resource input DMA interface. 

2. The autonomous configurable signal processing resource as set forth in claim 1 
wherein said DMA controller is further adapted to communicate data bidirectionally 
with local memories of one or more additional processing nodes. 

3. The autonomous configurable signal processing resource as set forth in claim 1 
wherein said DMA controller is further adapted to communicate data bidirectionally 
with one or more peripheral component interconnect ("PCI") bus interfaces. 

4. The autonomous configurable signal processing resource as set forth in claim 1 
wherein said local memory comprises at least two memory resources organized as 
ping-pong buffers. 

5. The autonomous configurable signal processing resource as set forth in claim 4 
wherein said ping-pong buffers are adapted for use as one write-only memory 
resource and one read-only resource, said write-only and read-only modes being 
switchable. 
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ABSTRACT: 

A multi-processor arrangement having an interprocessor communication path between 
each of every possible pair of processors, in addition to I/O paths to and from the 
arrangement, having signal processing functions configurably embedded in series 
with the communication paths and/or the I/O paths. Each processor is provided with 
a local memory which can be accessed by the local processor as well as by the other 
processors via the communications paths. This allows for efficient data movement 
from one processor's local memory to another processor's local memory, such as 
commonly done during signal processing corner turning operations . Configurable 
signal processing logic may be configured to host one or more signal processing 
functions which allow data to be autonomously accessed from the processor local 
memories, processed, and re-deposited in a local memory. 

CROSS-REFERENCE TO RELATED APPLICATIONS 



Claiming Benefit Under 35 U.S.C. 120 

[0001] This application is related to U.S. patent application Ser. No. 09/850,939, 
filed on May 8, 2001, docket number TFT2001-001, and to U.S. patent application 
Ser. No. 10/198,021, filed on Jul. 18, 2002, docket number TFT2002-001, both by 
Winthrop W. Smith. 
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